## [1] 75

1 Mapview

Figure 1.1: Map of CalEnvioScreen PM2.5 concentration for Bay Area census tracts

Figure 1.2: Map of CalEnvioScreen Asthma report for Bay Area census tracts

2 Scatterplot

Below is a scatterplot of PM2.5 versus Asthma report in each census tract in the Bay Area (Fig 2.1). The fitted line shown is not the best regression for the given data. The distribution of data points is not symmetric along both sides of the fitted line, i.e. the residuals of the data points are not normally distributed.

Scatterplot for PM2.5 and Asthma in the Bay Area

Figure 2.1: Scatterplot for PM2.5 and Asthma in the Bay Area

3 Linear regression

Under a linear regression model of (Asthma ~ PM2.5), an increase of 1 unit in PM2.5 concentrations in the air is associated with an increase of 19.8620167 in Asthma emergency department visits. 9.6% of the variance in asthma reports is explained by the variance in PM2.5 (or the fitted values from the model).

4 Residual plot

The residual plot shown in Fig 4.1 is not symmetric and has a long tail towards positive infinite, indicating the original data is skewed and the current model can not best predict or describe the relationship between the two variables.

Residual density plot for linear regression model (Asthma ~ PM2.5)

Figure 4.1: Residual density plot for linear regression model (Asthma ~ PM2.5)

5 Log transformation

The scatterplot of log-transformed PM2.5 and Asthma data is shown in Fig 5.1. The data distribution along the fitted line seems more symmetric and spread-out than previous model.

Scatterplot for log-transformed PM2.5 and Asthma in the Bay Area

Figure 5.1: Scatterplot for log-transformed PM2.5 and Asthma in the Bay Area

The residual for log-transformed linear model looks roughly normally distributed with a mean of 0. As compared to the previous scatterplot, it’s more symmetric. IThere is no skewness or long tail towards one direction.

Residual density plot for linear regression model (logAsthma ~ logPM2.5)

Figure 5.2: Residual density plot for linear regression model (logAsthma ~ logPM2.5)

6 Residual map

The difference between modeled asthma reports and the actual values in each tract is visualized in 6. From the map we can see the tract with the lowest residual is Stanford in Santa Clara, with a negative value of -2.0. This means the model overestimates the asthma reports with the given PM2.5 data. One possible reason for the overestimation is that Stanford is populated with international residents, who usually moves in and out of the given area within a short period that may not enable the rise asthma, or they are treated outside the area.

Figure 6.1: Residual map for asthma reportage and asthma estimation from log-transformed linear model